Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Unaccent with dandiset search filter #2142

Merged
merged 4 commits into from
Jan 28, 2025
Merged

Use Unaccent with dandiset search filter #2142

merged 4 commits into from
Jan 28, 2025

Conversation

jjnesbitt
Copy link
Member

@jjnesbitt jjnesbitt commented Jan 17, 2025

Closes #1941

This change normalized accent characters when searching dandisets. For example (from that issue), the words Buzsaki and Buzsáki would now resolve to the same word in search.

It seems the introduction of this change has no real impact on the query performance. In fact, perhaps due to now using a less complex search function overall (since we're no longer using the DRF SearchFilter class), the performance seems to be ever so slightly better.

Postgres docs for the unaccent extension: https://www.postgresql.org/docs/current/unaccent.html
Django docs for unaccent: https://docs.djangoproject.com/en/5.1/ref/contrib/postgres/lookups/#unaccent

Of note, we can't simply use the __unaccent__ lookup without any other changes, as is shown in the Django docs, because the metadata field is a JSONField, and that lookup only works on Charfield and Textfield.

@kabilar
Copy link
Member

kabilar commented Jan 17, 2025

Thanks @jjnesbitt. cc @bendichter

@jjnesbitt
Copy link
Member Author

Just realized I never wrote any tests for this. I'll do that.

@waxlamp waxlamp marked this pull request as draft January 21, 2025 23:41
@bendichter
Copy link
Member

Would you mind including a screenshot of a search for "Buzsaki" with this new change?

@jjnesbitt
Copy link
Member Author

jjnesbitt commented Jan 27, 2025

Would you mind including a screenshot of a search for "Buzsaki" with this new change?

Sure, here that is. Update: @waxlamp pointed out that there's multiple results for the same dandiset in this image. It seems an entry is being returned for every version in that dandiset. I'll work to fix that now.

Okay I've addressed that problem. Below is the updated screenshot.

Screenshot from 2025-01-27 15-10-05

@jjnesbitt jjnesbitt marked this pull request as ready for review January 27, 2025 22:13
@bendichter
Copy link
Member

This looks right to me

Copy link
Member

@mvandenburgh mvandenburgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jjnesbitt jjnesbitt added patch Increment the patch version when merged release Create a release when this pr is merged labels Jan 28, 2025
@jjnesbitt jjnesbitt merged commit 1290ea6 into master Jan 28, 2025
11 checks passed
@jjnesbitt jjnesbitt deleted the unaccent-search branch January 28, 2025 19:45
@dandibot
Copy link
Member

🚀 PR was released in v0.4.14 🚀

@dandibot dandibot added the released This issue/pull request has been released. label Jan 28, 2025
@bendichter
Copy link
Member

Thanks for pushing this through, @jjnesbitt !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Increment the patch version when merged release Create a release when this pr is merged released This issue/pull request has been released.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Searching for names with accented characters.
6 participants